Remove cudf.Scalar from shift/fillna #17922

mroeschke · 2025-02-05T18:48:27Z

Description

Toward #17843

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…ue return plc.Scalars

…alar/shift_fillna

vyasr · 2025-02-07T01:17:54Z

python/cudf/cudf/core/column/column.py

@@ -761,13 +760,21 @@ def _check_scatter_key_length(
                f"{num_keys}"
            )

+    def _scalar_to_plc_scalar(self, scalar: ScalarLike) -> plc.Scalar:


This feels a bit odd as a class method. I feel like a free function that accepts a dtype would be more appropriate, then we could call that with col.dtype. Scoping-wise this doesn't feel like a Column method. Plus then it would directly mirror pa_scalar_to_plc_scalar.

I guess there is currently a small benefit because we can override this method for decimal columns to get the specialized behavior that we need, but I think that we don't need that any more (see my comment on that class).

Now that I've closes #18035 as an attempt to avoid this decimal special casing, do you still feel strongly about having this as a free function? I chose a class method because, as you mentioned, I am able to customize this for decimal and it's a little more obvious when I could remove this in the future

No, I think it's fine to leave it as is for now.

vyasr · 2025-02-07T01:25:02Z

python/cudf/cudf/core/column/datetime.py

+            isinstance(fill_value, np.datetime64)
+            and self.time_unit != np.datetime_data(fill_value)[0]
+        ):
+            # TODO: Disallow this cast


I feel like a lot of your PRs have had these kinds of comments. Do they all fall into similar buckets? Should we open some issues for tracking?

Actually I double checked pandas and our casts here matches the pandas behavior for this method (although I'm not fond of it)

The question about tracking TODOs still applies but I'm good with holding off on that. grepping the codebase for these small things is OK with me for now given how much we're churning internally anyway.

vyasr · 2025-02-07T01:26:46Z

python/cudf/cudf/core/column/decimal.py

@@ -168,16 +170,35 @@ def _binaryop(self, other: ColumnBinaryOperand, op: str):

        return result

+    def _scalar_to_plc_scalar(self, scalar: ScalarLike) -> plc.Scalar:


Now that #17422 is merged I think we can stop special-casing this and see if anything breaks. WDYT? It does mean that decimal conversions in tests will fail if run with an older version of pyarrow, but I think that's an OK tradeoff. We might have to put some conditional xfails into our test suite for the "oldest" test runs.

I opened #18035 to dedicate to avoid the decimal special casing.

Can discuss on that PR, but IIUC, to avoid these conversion on the Python side, we would need pyarrow APIs introduced in pyarrow 19

Good call. Let's discuss there, I responded in #18035 (comment)

…alar/shift_fillna

vyasr · 2025-03-04T01:37:42Z

Approving since we're putting a pin in #18035.

mroeschke · 2025-03-04T03:07:42Z

/merge

vyasr · 2025-03-04T05:25:07Z

/merge

mroeschke added 3 commits February 4, 2025 17:46

Avoid cudf.Scalar in shift

388824f

Make exception for casting decimal scalars/ make _validate_fillna_val…

2a677b3

…ue return plc.Scalars

Merge remote-tracking branch 'upstream/branch-25.04' into cln/cudf_sc…

79d066f

…alar/shift_fillna

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 5, 2025

mroeschke self-assigned this Feb 5, 2025

mroeschke requested a review from a team as a code owner February 5, 2025 18:48

mroeschke requested review from bdice and brandon-b-miller February 5, 2025 18:48

Address dtlike test failures, workaround cast for plc decimals

f3cccc8

vyasr requested changes Feb 7, 2025

View reviewed changes

vyasr mentioned this pull request Feb 24, 2025

Use Pyarrow 19 decimal32/64 objects in cudf Python #18035

Closed

3 tasks

mroeschke and others added 3 commits February 28, 2025 12:51

Merge remote-tracking branch 'upstream/branch-25.04' into cln/cudf_sc…

85c6bff

…alar/shift_fillna

Remove comments as it matches pandas behavior

9962d04

Merge branch 'branch-25.04' into cln/cudf_scalar/shift_fillna

011c3c7

vyasr approved these changes Mar 4, 2025

View reviewed changes

github-actions bot assigned vyasr Mar 4, 2025

rapids-bot bot merged commit 45d8066 into rapidsai:branch-25.04 Mar 4, 2025
106 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cudf.Scalar from shift/fillna #17922

Remove cudf.Scalar from shift/fillna #17922

mroeschke commented Feb 5, 2025

vyasr Feb 7, 2025

vyasr Feb 7, 2025

mroeschke Feb 28, 2025

vyasr Mar 4, 2025

vyasr Feb 7, 2025

mroeschke Feb 28, 2025

vyasr Mar 4, 2025

vyasr Feb 7, 2025

mroeschke Feb 19, 2025

vyasr Feb 24, 2025

vyasr commented Mar 4, 2025

mroeschke commented Mar 4, 2025

vyasr commented Mar 4, 2025

		@@ -168,16 +170,35 @@ def _binaryop(self, other: ColumnBinaryOperand, op: str):

		return result

		def _scalar_to_plc_scalar(self, scalar: ScalarLike) -> plc.Scalar:

Remove cudf.Scalar from shift/fillna #17922

Remove cudf.Scalar from shift/fillna #17922

Conversation

mroeschke commented Feb 5, 2025

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr commented Mar 4, 2025

mroeschke commented Mar 4, 2025

vyasr commented Mar 4, 2025